<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<title>Mali OpenCL SDK v1.1.0: FIR Filter</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="navtree.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="resize.js"></script>
<script type="text/javascript" src="navtree.js"></script>
<script type="text/javascript">
  $(document).ready(initResizable);
</script>
<link href="search/search.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="search/search.js"></script>
<script type="text/javascript">
  $(document).ready(function() { searchBox.OnSelectItem(0); });
</script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
 <tbody>
 <tr style="height: 56px;">
  <td id="projectlogo"><img alt="Logo" src="logo.png"/></td>
  <td style="padding-left: 0.5em;">
   <div id="projectname">Mali OpenCL SDK v1.1.0
   </div>
  </td>
 </tr>
 </tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.2 -->
<script type="text/javascript">
var searchBox = new SearchBox("searchBox", "search",false,'Search');
</script>
  <div id="navrow1" class="tabs">
    <ul class="tablist">
      <li><a href="index.html"><span>Home</span></a></li>
      <li class="current"><a href="pages.html"><span>Help&#160;and&#160;Tutorials</span></a></li>
      <li><a href="files.html"><span>Files</span></a></li>
      <li>
        <div id="MSearchBox" class="MSearchBoxInactive">
        <span class="left">
          <img id="MSearchSelect" src="search/mag_sel.png"
               onmouseover="return searchBox.OnSearchSelectShow()"
               onmouseout="return searchBox.OnSearchSelectHide()"
               alt=""/>
          <input type="text" id="MSearchField" value="Search" accesskey="S"
               onfocus="searchBox.OnSearchFieldFocus(true)" 
               onblur="searchBox.OnSearchFieldFocus(false)" 
               onkeyup="searchBox.OnSearchFieldChange(event)"/>
          </span><span class="right">
            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
          </span>
        </div>
      </li>
    </ul>
  </div>
</div><!-- top -->
<div id="side-nav" class="ui-resizable side-nav-resizable">
  <div id="nav-tree">
    <div id="nav-tree-contents">
      <div id="nav-sync" class="sync"></div>
    </div>
  </div>
  <div id="splitbar" style="-moz-user-select:none;" 
       class="ui-resizable-handle">
  </div>
</div>
<script type="text/javascript">
$(document).ready(function(){initNavTree('fir_tutorial.html','');});
</script>
<div id="doc-content">
<!-- window showing the filter options -->
<div id="MSearchSelectWindow"
     onmouseover="return searchBox.OnSearchSelectShow()"
     onmouseout="return searchBox.OnSearchSelectHide()"
     onkeydown="return searchBox.OnSearchSelectKey(event)">
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark">&#160;</span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark">&#160;</span>Classes</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark">&#160;</span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark">&#160;</span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark">&#160;</span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark">&#160;</span>Macros</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark">&#160;</span>Pages</a></div>

<!-- iframe showing the search results (closed by default) -->
<div id="MSearchResultsWindow">
<iframe src="javascript:void(0)" frameborder="0" 
        name="MSearchResults" id="MSearchResults">
</iframe>
</div>

<div class="header">
  <div class="headertitle">
<div class="title">FIR Filter </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Floating point FIR (Finite Input Response) image filter for pixelization and/or noise reduction.</p>
<h1><a class="anchor" id="firExampleResult"></a>
Example Result</h1>
<div class="image">
<img src="fir_float_input.bmp" alt="fir_float_input.bmp"/>
<div class="caption">
Input image</div></div>
 <div class="image">
<img src="fir_float_output.bmp" alt="fir_float_output.bmp"/>
<div class="caption">
Output image</div></div>
 <h1><a class="anchor" id="firAlgorithm"></a>
The Algorithm</h1>
<p>The FIR filter is designed to calculate averages from a finite input.</p>
<p>To simplify the concept of FIR filtering, consider the one-dimensional signal [17 76 17 84 29], to which we will apply a 3x1 filter with the coefficients [3/15 9/15 3/15]. To keep the output signal values less than the common coefficient denominator, all the coefficient numerators summed together should be less than or equal to the common coefficient denominator.</p>
<ol type="1">
<li>The filter is overlaid on top of the start of the signal and "reads" the values [17 76 17].</li>
<li><p class="startli">The output is the sum of the signal-values multiplied with the filter coefficients.</p>
<p class="startli">[17 76 17] &rarr; [(17 * 3/15) + (76 * 9/15) + (17 * 3/15)] &rarr; [52.4] &rarr; [52]. (Using integers for simplicity).</p>
</li>
<li>The "kernel" (i.e. the filter as applied to the signal, sometimes also known as the "window") is then slid one step to the right where it reads [76 17 84].</li>
<li>The ouput is now: [(76 * 3/15) + (17 * 9/15) + (84 * 3/15)] &rarr; [42.2] &rarr; [42].</li>
<li>Next it reads [17 84 29] &rarr; [59.6] &rarr; [59]</li>
<li>Since we can no longer slide the kernel along the signal without going out of bounds we stop there and the full output signal is [52 42 59].</li>
</ol>
<h1><a class="anchor" id="firImplementation"></a>
Implementation</h1>
<h2><a class="anchor" id="firSize"></a>
Image Size</h2>
<p>We have included a 512x512 input bitmap for use with this sample (to keep the size of the installer small). However, you are more likely to see performance improvements (when compared to C code running on a CPU) when larger images are used. There is some start-up overhead associated with using OpenCL. This overhead can outweigh the benefits of parallel processing when the input data sizes are small.</p>
<p>This sample has been coded to allow any input bitmap to be used. Simply change <em>input.bmp</em> in the assets directory of the sample to the input image of your choice. You will see larger calculation performance improvements when larger images are used.</p>
<h2><a class="anchor" id="firPadding"></a>
Padding</h2>
<p>It is important to note that we have not considered padding here. The output image is two pixels smaller in both dimensions. Because every output requires pixels around it, it is impossible to calculate the output for the edge pixels. In this example we are simply leaving the edge output pixels as the values they are initialized to.</p>
<p>Sometimes it can be desirable to have the size of the output-signal be the same as the size of the input-signal, in which case "padding" must be applied to the input to take into account for the fact that the filter-application, by its nature, reduces the size. Strategies for padding differ, but for images, a common choice is to repeat the boundary-values (i.e. the outmost set of pixels) on all sides or (in some cases) on just some sides.</p>
<h2><a class="anchor" id="firMali"></a>
Mali-T600 Series Hardware</h2>
<p>Mali-T600 series GPU pipelines provide true IEEE-754 single-precision floating-point math in hardware. We recommend to use vectors of 128-bit wide. For more information about vectorization, see <a class="el" href="hello_world_vector_tutorial.html">Vectorizing your OpenCL code</a>.</p>
<p>In this sample, the calculations use 32-bit floating point numbers. One 128-bit vector can fit four 32-bit floating point numbers. Therefore, using float4's makes maximum use of the hardware.</p>
<p><b>We recommend the use of vectors wherever possible when using a Mali-T600 series GPU.</b></p>
<h2><a class="anchor" id="firSpecifics"></a>
Implementation Specifics</h2>
<p>We implement FIR filtering on a single 8-bit channel for simplicity. To do FIR filtering on RGB images you can run the FIR filter on each channel separately and then combine the results. In this sample we take an RGB image, convert it to a 8-bit luminance image and send it to the GPU.</p>
<p>Each FIR calculation gives an output for the centre pixel of the mask. The output value of the centre pixel is the sum of the pixel values in a 3x3 grid around the pixel, multiplied by the coefficients. This can be split into three stages by doing the summations for each row of the grid separately. This sample does the calculations one row at a time. However, instead of operating on one pixel at a time, it uses vectors of 4 pixels at once.</p>
<h2><a class="anchor" id="firFilter"></a>
The Filter</h2>
<p>The kernel applies a 3x3 FIR filter with constant coefficients (weights) to a 6x3 window in an input luminance image to produce 4x1 windows in the output image.</p>
<p>The input and output images are represented using arrays of floating point numbers.</p>
<p>The FIR coefficients are defined as constant floats and organized as follows:</p>
<pre class="fragment">FW_UL FW_UM FW_UR
FW_CL FW_CM FW_CR
FW_BL FW_BM FW_BR
</pre><p>In a real application, these values can be derived in a number of different ways depending on the intended result. For this sample, the values are random.</p>
<p>Our common coefficient denominator is 256, so the output pixel fits within a char. And as explained in <a class="el" href="fir_tutorial.html#firAlgorithm">The Algorithm</a> section, the sums of the coefficient numerators must be less than 256.</p>
<p>The coefficients being used are: </p>
<pre class="fragment">[30 5  6 ]
[19 30 9 ] / 256
[15 5  40]
</pre><h2><a class="anchor" id="firCode"></a>
The Code</h2>
<p>Unless otherwise noted, all code snippets come from the OpenCL kernel found in <a class="el" href="fir__float_8cl.html">fir_float.cl</a>.</p>
<ol type="1">
<li><p class="startli"><b>Choosing the size of the kernel</b></p>
<p class="startli">We are using vector types in the kernel and so we are actually outputting 4 results per kernel. See above for more details of vectorising. We adjust the pointers into the data to reflect this: </p>
<div class="fragment"><div class="line">    <span class="comment">/*</span></div>
<div class="line"><span class="comment">     * Each kernel calculates 4 output pixels in the same row (hence the &#39;* 4&#39;).</span></div>
<div class="line"><span class="comment">     * column is in the range [0, width] in steps of 4.</span></div>
<div class="line"><span class="comment">     * row is in the range [0, height].</span></div>
<div class="line"><span class="comment">     */</span></div>
<div class="line">    <span class="keyword">const</span> <span class="keywordtype">int</span> column = get_global_id(0) * 4;</div>
<div class="line">    <span class="keyword">const</span> <span class="keywordtype">int</span> row = get_global_id(1);</div>
<div class="line">    <span class="comment">/* Offset calculates the position in the linear data for the row and the column. */</span></div>
<div class="line">    <span class="keyword">const</span> <span class="keywordtype">int</span> offset = row * width + column;</div>
</div><!-- fragment --><p> And when we enqueue the kernel in <a class="el" href="fir__float_8cpp.html">fir_float.cpp</a>, we reduce the worksize accordingly: </p>
<div class="fragment"><div class="line">    <span class="comment">/*</span></div>
<div class="line"><span class="comment">     * Each instance of the kernel operates on a 4 * 1 portion of the image.</span></div>
<div class="line"><span class="comment">     * Therefore, the global work size must be width / 4 by height / 1 work items.</span></div>
<div class="line"><span class="comment">     */</span></div>
<div class="line">    <span class="keywordtype">size_t</span> globalWorksize[2] = {width / 4, height / 1};</div>
</div><!-- fragment --></li>
<li><p class="startli"><b>Loading the input data</b></p>
<p class="startli">Here we do vector loads from one row of the data: </p>
<div class="fragment"><div class="line">    <span class="comment">/*</span></div>
<div class="line"><span class="comment">     * Access the first row in the 6x3 window to apply FW_U coefficients.</span></div>
<div class="line"><span class="comment">     * data1 can be constructed from the other vectors without doing an additional load.</span></div>
<div class="line"><span class="comment">     */</span></div>
<div class="line">    float4 data0 = vload4(0, input + offset);</div>
<div class="line">    float4 data2 = vload4(0, input + offset + 2);</div>
<div class="line">    float4 data1 = (float4)(data0.s12, data2.s12);</div>
</div><!-- fragment --></li>
<li><p class="startli"><b>Applying the filter</b></p>
<p class="startli">Then we carry out the calculation on 4 pixels. Each vector calculation can be done as a single operation on Mali-T600 series GPU: </p>
<div class="fragment"><div class="line">    accumulator += data0 * <a class="code" href="fir__float_8cl.html#ad4c882aae33d779624d02d0d88bae565">FW_UL</a>;</div>
<div class="line">    accumulator += data1 * <a class="code" href="fir__float_8cl.html#ada3c939d265394a260f3bcf800c16d01">FW_UM</a>;</div>
<div class="line">    accumulator += data2 * <a class="code" href="fir__float_8cl.html#a86163952d447939d88b2ab454efd091a">FW_UR</a>;</div>
</div><!-- fragment --><p> We apply the same pattern to the second and third row, to accumulate the result in the <em>accumulator</em> variable: </p>
<div class="fragment"><div class="line">    <span class="comment">/* Access the second row in the 6x3 window and repeat the process, but with FW_C coefficients. */</span></div>
<div class="line">    data0 = vload4(0, input + offset + width);</div>
<div class="line">    data2 = vload4(0, input + offset + width + 2);</div>
<div class="line">    data1 = (float4)(data0.s12, data2.s12);</div>
<div class="line"></div>
<div class="line">    accumulator += data0 * <a class="code" href="fir__float_8cl.html#ac7935f8e7004ebaa9c1cb835ebb7576e">FW_CL</a>;</div>
<div class="line">    accumulator += data1 * <a class="code" href="fir__float_8cl.html#a5d47ed70581c7a693231547336cb00aa">FW_CM</a>;</div>
<div class="line">    accumulator += data2 * <a class="code" href="fir__float_8cl.html#a1a85069496abcfdd6757ca710c185c19">FW_CR</a>;</div>
<div class="line"></div>
<div class="line">    <span class="comment">/* Access the third row in the 6x3 window and repeat the process, but with FW_B coefficients. */</span></div>
<div class="line">    data0 = vload4(0, input + offset + width * 2);</div>
<div class="line">    data2 = vload4(0, input + offset + width * 2 + 2);</div>
<div class="line">    data1 = (float4)(data0.s12, data2.s12);</div>
<div class="line"></div>
<div class="line">    accumulator += data0 * <a class="code" href="fir__float_8cl.html#a436f4eeb8849d5ddf8c0fcba12a847b6">FW_BL</a>;</div>
<div class="line">    accumulator += data1 * <a class="code" href="fir__float_8cl.html#ad946956a7dc74b2b24008dfbcf6cc56f">FW_BM</a>;</div>
<div class="line">    accumulator += data2 * <a class="code" href="fir__float_8cl.html#ac2d22f5bd6ac4522b7fe5cbc1b32c566">FW_BR</a>;</div>
</div><!-- fragment --></li>
<li><p class="startli"><b>Storing the result</b></p>
<p class="startli">Finally store the data. We use a vector store to write out the 4 results at once: </p>
<div class="fragment"><div class="line">    <span class="comment">/* Store the accumulator. */</span></div>
<div class="line">    vstore4(accumulator, 0, output + offset);</div>
</div><!-- fragment --> </li>
</ol>
<h1><a class="anchor" id="firRunning"></a>
Running the Sample</h1>
<ol type="1">
<li><p class="startli">From a command prompt in the root of the SDK, run:</p>
<div class="fragment"><div class="line">cd samples/<a class="code" href="fir__float_8cl.html#a1bacdbaeb8f7bf3c8ac34929a28e689c" title="FIR filter kernel function.">fir_float</a></div>
</div><!-- fragment --> <div class="fragment"><div class="line">make install</div>
</div><!-- fragment --><p class="startli">This compiles the FIR float sample code and copies all the files it needs to run to the bin folder in the root directory of the SDK.</p>
</li>
<li>Copy this folder to the board.</li>
<li><p class="startli">Navigate to the folder on the board and run the FIR float binary:</p>
<div class="fragment"><div class="line">./<a class="code" href="fir__float_8cl.html#a1bacdbaeb8f7bf3c8ac34929a28e689c" title="FIR filter kernel function.">fir_float</a></div>
</div><!-- fragment --></li>
<li><p class="startli">You should see output similar to:</p>
<div class="fragment"><div class="line">Profiling information:</div>
<div class="line">Queued time:    0.057ms</div>
<div class="line">Wait time:      0.071017ms</div>
<div class="line">Run time:       0.475355ms</div>
</div><!-- fragment --><p class="startli">An output image should be created on the board called output.bmp.</p>
</li>
</ol>
<p>Find solutions for <a class="el" href="common_issues.html">Common Issues</a>.</p>
<h1><a class="anchor" id="firMoreInformation"></a>
More Information</h1>
<p>For more information have a look at the code in <a class="el" href="fir__float_8cpp.html">fir_float.cpp</a> and <a class="el" href="fir__float_8cl.html">fir_float.cl</a>. </p>
</div></div><!-- contents -->
</div><!-- doc-content -->
<!-- start footer part -->
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
  <ul>
    <li class="navelem"><a class="el" href="tutorials.html">Tutorials</a></li>
    <li class="footer">
        <a href="http://www.arm.com/">(C) ARM Ltd. 2013</a>
    </li>
  </ul>
</div>
</body>
</html>
